NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Differentially private estimation of weighted average treatment effects for binary outcomes

https://doi.org/10.1016/j.csda.2025.108145

Guha, Sharmistha; Reiter, Jerome P (July 2025, Computational Statistics & Data Analysis)

Free, publicly-accessible full text available July 1, 2026
A Bayesian Multiplex Graph Classifier of Functional Brain Connectivity Across Diverse Tasks of Cognitive Control

https://doi.org/10.1007/s12021-024-09670-w

Guha, Sharmistha; Rodriguez-Acosta, Jose; Dinov, Ivo D (October 2024, Neuroinformatics)

This article seeks to investigate the impact of aging on functional connectivity across different cognitive control scenarios, particularly emphasizing the identification of brain regions significantly associated with early aging. By conceptualizing functional connectivity within each cognitive control scenario as a graph, with brain regions as nodes, the statistical challenge revolves around devising a regression framework to predict a binary scalar outcome (aging or normal) using multiple graph predictors. Popular regression methods utilizing multiplex graph predictors often face limitations in effectively harnessing information within and across graph layers, leading to potentially less accurate inference and predictive accuracy, especially for smaller sample sizes. To address this challenge, we propose the Bayesian Multiplex Graph Classifier (BMGC). Accounting for multiplex graph topology, our method models edge coefficients at each graph layer using bilinear interactions between the latent effects associated with the two nodes connected by the edge. This approach also employs a variable selection framework on node-specific latent effects from all graph layers to identify influential nodes linked to observed outcomes. Crucially, the proposed framework is computationally efficient and quantifies the uncertainty in node identification, coefficient estimation, and binary outcome prediction. BMGC outperforms alternative methods in terms of the aforementioned metrics in simulation studies. An additional BMGC validation was completed using an fMRI study of brain networks in adults. The proposed BMGC technique identified that sensory motor brain network obeys certain lateral symmetries, whereas the default mode network exhibits significant brain asymmetries associated with early aging.
more » « less
Full Text Available
Covariate-Dependent Clustering of Undirected Networks with Brain-Imaging Data

https://doi.org/10.1080/00401706.2024.2321930

Guha, Sharmistha; Guhaniyogi, Rajarshi (March 2024, Technometrics)

Full Text Available
Regression-assisted Bayesian record linkage for causal inference in observational studies with covariates spread over two files

https://doi.org/10.1016/j.jspi.2023.07.004

Guha, Sharmistha; Reiter, Jerome P (March 2024, Journal of Statistical Planning and Inference)

We consider causal inference for observational studies with data spread over two files. One file includes the treatment, outcome, and some covariates measured on a set of individuals, and the other file includes additional causally-relevant covariates measured on a partially overlapping set of individuals. By linking records in the two databases, the analyst can control for more covariates, thereby reducing the risk of bias compared to using only one file alone. When analysts do not have access to a unique identifier that enables perfect, error-free linkages, they typically rely on probabilistic record linkage to construct a single linked data set, and estimate causal effects using these linked data. This typical practice does not propagate uncertainty from imperfect linkages to the causal inferences. Further, it does not take advantage of relationships among the variables to improve the linkage quality. We address these shortcomings by fusing regression-assisted, Bayesian probabilistic record linkage with causal inference. The Markov chain Monte Carlo sampler generates multiple plausible linked data files as byproducts that analysts can use for multiple imputation inferences. Here, we show results for two causal estimators based on propensity score overlap weights. Using simulations and data from the Italy Survey on Household Income and Wealth, we show that our approach can improve the accuracy of estimated treatment effects.
more » « less
Full Text Available
The association between long-term PM2.5 exposure and risk for pancreatic cancer: an application of social informatics

https://doi.org/10.1093/aje/kwae271

Bhavsar, Nrupen A; Jowers, Kay; Yang, Lexie Z; Guha, Sharmistha; Lin, Xuan; Peskoe, Sarah; McManus, Hannah; McElroy, Lisa; Bravo, Mercedes; Reiter, Jerome P; et al (August 2024, American Journal of Epidemiology)

There is a profound need to identify modifiable risk factors to screen and prevent pancreatic cancer. Air pollution, including fine particulate matter (PM2.5), is increasingly recognized as a risk factor for cancer. We conducted a case-control study using data from the electronic health record (EHR) of Duke University Health System, 15-year residential history, NASA satellite fine particulate matter (PM2.5), and neighborhood socioeconomic data. Using deterministic and probabilistic linkage algorithms, we linked residential history and EHR data to quantify long-term PM2.5 exposure. Logistic regression models quantified the association between a 1 interquartile range (IQR) increase in PM2.5 concentration and pancreatic cancer risk. The study included 203 cases and 5027 controls (median age of 59 years, 62% female, 26% Black). Individuals with pancreatic cancer had higher average annual exposure (9.4 μg/m3) as compared to an IQR increase in average annual PM2.5, which was associated with greater odds of pancreatic cancer (odds ratio = 1.20; 95% CI, 1.00-1.44). These findings highlight the link between elevated PM2.5 exposure and increased pancreatic cancer risk. They may inform screening strategies for high-risk populations and guide air pollution policies to mitigate exposure. This article is part of a Special Collection on Environmental Epidemiology.
more » « less
Full Text Available
High-Dimensional Bayesian Network Classification with Network Global-Local Shrinkage Priors

https://doi.org/10.1214/23-BA1378

Guha, Sharmistha; Rodriguez, Abel (January 2023, Bayesian Analysis)

Full Text Available
Bayesian Generalized Sparse Symmetric Tensor-on-Vector Regression

https://doi.org/10.1080/00401706.2020.1784799

Guha, Sharmistha; Guhaniyogi, Rajarshi (June 2020, Technometrics)

Motivated by brain connectome datasets acquired using diffusion weighted magnetic resonance imaging (DWI), this article proposes a novel generalized Bayesian linear modeling framework with a symmetric tensor response and scalar predictors. The symmetric tensor coefficients corresponding to the scalar predictors are embedded with two features: low-rankness and group sparsity within the low-rank structure. Besides offering computational efficiency and parsimony, these two features enable identification of important “tensor nodes” and “tensor cells” significantly associated with the predictors, with characterization of uncertainty. The proposed framework is empirically investigated under various simulation settings and with a real brain connectome dataset. Theoretically, we establish that the posterior predictive density from the proposed model is “close” to the true data generating density, the closeness being measured by the Hellinger distance between these two densities, which scales at a rate very close to the finite dimensional optimal rate, depending on how the number of tensor nodes grow with the sample size.
more » « less
Full Text Available
Bayesian Regression With Undirected Network Predictors With an Application to Brain Connectome Data

https://doi.org/10.1080/01621459.2020.1772079

Guha, Sharmistha; Rodriguez, Abel (January 2020, Journal of the American Statistical Association)

Full Text Available

Search for: All records